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Abstract 

Known results on the moments of the distribution generated by the two-locus Wright- 
Fisher diffusion model and a duality between the diffusion process and the ancestral process 
with recombination are briefly summarized. A numerical methods for computing moments 
by a Markov chain Monte Carlo and a method to compute closed-form expressions of 
the moments are presented. By using the duality argument properties of the ancestral 
recombination graph are studied in terms of the moments. 
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1. Introduction 

In classical population genetics theory behavior of frequency of a gene type (allele) 
has been a central issue (for example, [2]). The fate of the allele frequency has been 
modeled by a diffusion process, where the population size is assumed to be sufficiently 
large. The diffusive limit, which is called as the Wright-Fisher diffusion model, is expected 
to illustrate actual evolution of the allele frequency in the population. Numerical methods 
for computing likelihood of a sample taken from the equilibrium have attracted much 
interest (for example, [22]). Explicit and closed- form expressions of the whole process 
have importance by their own right. Unfortunately, their availability has been limited. 
For the one-locus two-allele model without mutation and other evolutionary forces closed- 
form expressions of the probability density of the allele frequency at a fixed time were 
obtained in terms of orthogonal polynomials [16j.|10j. In contrast, for two-locus models 
known closed-form expressions have been limited to several moments of the distribution 
generated by the diffusion process [20], [H]. A comprehensive survey at early 1970's, 
which is still useful, is [15j. Recently, closed-form expressions of a class of moments were 
obtained in terms of orthogonal polynomials |17j . 

The concept of duality has been a powerful tool in stochastic analysis of interacting 
particle systems [U] . In population genetics theory the moment dual of the Wright-Fisher 
diffusion model was firstly used in [21j. A genealogical process of a sample taken from a 
population, which is known as the coalescent [12], has been useful for population genetic 
data analyses. The duality was used to obtain branching-coalescent processes as models of 
natural selection [13] and conversion bias [19] . Ancestral processes are process of numbers 
of ancestral lineages in a section of ancestral graphs, which are analogues of the coalescent 
genealogy. The dual of the one-locus two-allele model Wright-Fisher diffusion model with 
directional selection [11] is an ancestral process, which is the number of ancestral lineages 
in a section of the ancestral selection graph. The dual is a birth and death process with 
linear birth and quadratic death rates. It was demonstrated that properties of the birth 
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and death process can be studied by referring to classical results on the Wright-Fisher 
diffusion model [18]. For the multi- locus model an analogue of the coalescent genealogy, 
called the ancestral recombination graph (ARG), was introduced [7J. The two- locus ARG 
integrates marginal genealogies at the two loci. The ancestral process, which is a process 
of the numbers of ancestral lineages in a section of the ARG, is the dual of the two- locus 
two-allele Wright-Fisher diffusion model [5]. 

In section 2, we briefly summarize known results on the moments of the distribution 
generated by the two-locus two-allele Wright-Fisher diffusion model. In section 3, the mo- 
ment duality between the diffusion process and the ancestral process, which is a process 
of the numbers of ancestral lineages in a section of an ARG, is introduced. A numerical 
method for computing moments at a fixed time by a Markov chain Monte Carlo is intro- 
duced. In section 4, method to compute closed-form expressions of the moments by using 
an ARG terminology is presented. In section 5, by using the duality argument properties 
of the ARG are studied in terms of the moments. 

2. Summary of known results on the moments 

Consider a random mating monoecious diploid population consisting of individuals. 
Two linked loci A and B are segregating, where recombination fraction between the two 
loci is r. Pairs of alleles Ai,A2 and Bi, B2 are in the loci A and B, respectively. A diffusive 
limit is measuring time in units of 2A^ generations and 2A^ — )• 00, while p = 4Nr is kept 
constant. Let frequencies of gametes AiBi, A1B2, A2B1, and A2B2, be respectively, xi, 
X2, X3, and I — X1—X2 — X3. Frequencies of the alleles Ai and A2 are denoted by x and 1 — x, 
respectively, and those of the alleles Bi and B2 are denoted by y and 1 — y, respectively. 
Then x = xi + X2 and y = xi + X3. Set z = xi{l — xi—X2 — X3) — X2X3, which is a measure 
of association between x and y. The limiting diffusion process {xi{t),X2{t),X3{t);t > 0} 
is defined in a simplex 

K : < xi < xi + X2 < xi + X2 + X3 < 1. 



5 

Let H = ^(K), where <I'(xi, X2, X3) = {x,y,z) is a C°°-diffeomorphism of K onto H. The 
generator of the diffusion process {x{t),y{t), z{t);t > 0} in H is [20] 



x{l - x) 92 2/(1 - y) 92 92 , s 5^ o ^ 



2 2 9x9y dxdz dydz 

(2.1) (1 + ^ + i {xy(l -x)(l-y)+ z{l - 2x)(l - 2y) - z^] ^. 

In the classical population genetics theory some problems of general interests are con- 
cerning events of fixation. The probability of eventual fixation of an allele and the proba- 
bility density of the time to the fixation have been studied. Some of these properties can 
be studied in terms of moments of the distribution generated by the model by using mo- 
ment inversion formula. The probability of eventual fixation of a gamete in the two-locus 
two-allele Wright-Fisher diffusion model governed by the generator ()2.ip is obtained im- 
mediately by the moments. In fact, since the stationary density is atomic, lim E.[x(t)y{t)] 
gives the fixation probability of the gamete AB. For an allele two types of fixation can be 
defined; first fixation occurs when the first of the four alleles is lost and the final fixation 
occurs when an allele at the other locus is lost (a gamete fix). These fixation times are 

Ti = inf{t>0;x(t)(l-x(t))y(t)(l-y(t)) = 0}. 
To = mi{t>0;x{t){l-x{t))+y{t){l-y{t))=0}, 

respectively. The probability densities are 

P[ri <t] = hm E [{1 - x{t){l - x{t))yiy){l - y{t))r] , 
P[ro<t] = hm E[{l-x(t)(l-x(t))-y(y)(l-y(t))r], 

n~>oo 

respectively. It seems impossible to obtain explicit and closed-form expressions of these 
limits (see Section 4). Nevertheless, some of the moments whose closed-form expressions 
are available are useful to obtain upper bounds and approximate formula of these prob- 
abilities [E]. By the same reason a closed- form expression of the joint distribution of 
{x{t),y{t), z{t)) at a fixed time t is not available. 
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Let us introduce a classification of the moments of the distribution generated by the 
generator (j2.ip . 

Definition 2.1. The rank and class of a moment, which is an expectation of a monomial 
x^u^Xi, l,m,n € are I + m + 2n and n + min{Z, m}, respectively. The rank is equals 
to or larger than twice of the class. 

Remark 2.2. The class-zero moments have closed-form expressions and they are the 
moments of the one-locus Wright-Fisher diffusion model [lOj. The class-one moments 
have closed-form expressions [TT] (see below). 

Other moments whose closed-form expressions have been obtained are expectations of 
a type of polynomials: 

Lemma 2.3 (Lemma 3.6.1 of [15j). The manifold of polynomials spanned by the set of 
polynomials {x'(l — j;)'y"^(l — y)™2;"(l — 2x)"(l — 2y)"}, where a = 0,1 and l,m > if 
n = 0, is closed under the operation of C. 

Remark 2.4. The polynomials have zero on the boundary of the square x{l — x)y{l — y) = 
and z = 0. Known closed-form expressions for the moments of polynomials of this type 
are {l,m,n,a) = (1,1,0,0), (0,0,2,0), and (0,0,0,1) by [2D]. Expressions for {2,1,0,0) , 
(1,2,0,0), (2,0,1,1), (0,2,1,1), (2,0,2,0), (0,2,2,0), (2,2,0,0), (1,1,1,1), (1,1,2,0), 
(0, 0, 3, 1), and (0, 0, 4, 0) are given by [15], where the expressions involve eigenvalues whose 
closed-form expressions are not available. 

In |17] a closed-form expression of E[2;i(t)|a;(f) G (0, 1)] was obtained by using a limit of 
a closed- form expression of a class-one moment. It yields an expression of the conditional 
covariance between x and y given that alleles Ai and A2 are segregating in the locus A. 
This expression has importance in interpreting observable polymorphism in population 
genetic data analysis. Here, we summarize some results in [17] . because the explicit 
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expressions are used in the later sections. If the argument equals to unity, the truncated 
hyper geometric series 

is expressed by the generalized hyper geometric series [l] 

, , T(a + n + l)T(b + n+l) ^, , 

ynia, b, c; 1) = — — — — — 3^2(0, b, c + n;c,a + b + n + 

n!r(a + + n + 1) 

where 3-F2(") is the generalized hyper geometric series. A trivial but useful identity is 



Lemma 2.5 ([T7]). For m, n G 2+ and a,b,c G C, 

n]T(a + b + n + l) 



r(a + n + l)r(6 + n + l) 



yn{a, b;a + b + m + 1-1) 



2.2) = , ^ ——y^{a,b;a + b + n + l;l). 

1 (a + m + 1)1 (0 + m + 1) 

Remark 2.6. Ifm = 0, it gives an identity 



2Fi{a,b;a + b + l;l) = 3^2(0, 6, a + 6 + n + 1; a + 6 + 1, a + 6 + n + 1; 1) 

n\r{a + b + n + l) 



r(a + n + l)r(6 + n + l 
r(a + 6+l) 



-yn{a,b,a + b + l;l) 



r(a + l)r(6 + l)' 
which is a spacial case of the Gauss hypergeometric theorem [Ij . 

An expression of a power of p in terms of the Gegenbauer polynomial follows by the 
orthogonal and complete property [17j : 

n+2 

2(2m - 1) ) 

^ jm+l 



(2.3) p^=Y^ 2{2m - (-l)"^T^_^(l _ 2p), n G Z+, 

(-) 

where Z!m(') is the Gegenbauer polynomial, which also denoted by (7^^ (•). [n]m and 
{n)m are falling and rising factorials, respectively. By using this expression closed form 
solutions of systems of differential equations for class-one moments were obtained. Let 

W,,n,n(i) = Ep^Mtyy{trz{tr] for l,m,ne Z+. 



Proposition 2.7 (([T7])). For n e 

n+2 



m=2 

Proposition 2.8 ([17J). For n G Z+, 



^ (n + Ijm+l 



„ r n— 1 n 

,1,0(0 = M + ^ + E + E Fi-'e- 



2 + P 

m=l m=l 



except for p = [k + m){k — m — l),k = m + 2, m + 3, n; m = 1, 2, n, where 



(n) 



m+l 



^^^^p(l-p)gT^,(l-2p) 



r^(i-2p) ^ r^.2(i-2p) 

2(m + 1) + /9 2m — p 



and 
(2.4) 



_ 0. . ^m Nm [ 1 1 (n - m) (n - m - 1) ] , n - 9r.W 

^ ^ (n)„\2m + p^2(m + l)-p(n + m)(n + m + l)J ' 

with conventions that the first sum is zero if n = 1 and T}_i(-) = 0. 



Proof A sketch of the proof was given in |17] . A system of differential equations for the 
moments gives 

_ N„^+i!(2m + l)! (^) „ T 

(nWi(m + l)!m!^™+i' ^ ^ Z+, m - 1, 2, n - 1, 

and 

{(n + ,„)(n - m - 1) - /,} Fi™) = 4n(2m + l)i!i^lkzl(_i)-+iT^_^(l _ 2p)d 
(2.5) +n(n -1)F^™J, n G Z+; m = 1, 2, n, 
with the initial condition 

n-l n 

p> =M+ ^ + E e'^'^ + E ^ ^ 

m=l m=l 

By using (12. 2p with m = 1, 2, it is straightforward to solve (12. 5p for Fn™'' and the solution 
is (|2.4|) . except for p = {k + m){k — m — 1), A; = m + 2, m + 3, n; m = 1, 2, n (The 
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exceptional values given in \T7] are incorrect). By setting E'n"^^ = qE^^^ {p) + dEl^^{p), 
the initial condition gives 

n— 1 n—2 

m=l i=0 

n-2 i+2 



Pip - 1) E E 2(2- - l)f^^(-lrt-2(l - 2p) 

i=0 m=2 + -^^"^+1 



n-1 



p(p-l)^2(-l 



(2m)! 

m=l ^ ^ 



n— 1 

M m+i m + 1 



xyn-ni-i{m + 1, m, 2m + 2; l)r^_i (1 - 2p) 

n-1 

(n)^+i m(m + 1) 

for each n = 2, 3, where the second equality holds by (|2.3p and the last equality holds 
by (j2.2p with m = 0. The expression of summation of E^^^ [p) follows by re-arrangement 
of terms. □ 

3. The duality and a numerical method for computing moments 

The process of the numbers of lineages including non-ancestral lineages (see below) in 
a section of a two-locus ARG is a birth and death process [7j. When there are i lineages, 
the birth rate is ip/2 and the death rate is i{i — l)/2. The birth and death process is 
identical to the numbers of ancestral lineages in a section of an ancestral selection graph, 
and the moment dual of the birth and death process is the Wright-Fisher diffusion of the 
one- locus two-allele model with directional selection [18j . Note that, an ARG involves 
gametes whose alleles are not ancestral to any allele in a sample. We denote alleles which 
are not ancestral to any allele in the sample by — , since in principle the allelic state 
of the locus cannot be specified by the sample. For example, a gamete AB could be a 

recombinant descendant of A—, which in turn could be a recombinant descendant of . 

The gamete is involved in the ARG, nevertheless, whose alleles are not ancestral to 

any allele in the sample. In this paper, we discuss the ancestral process which generates 
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numbers of ancestral lineages in a section of an ARG. The ancestral lineages are a subset 
of lineages in an ARG. The stationary distribution of the process with infinitely- many- 
allele mutation model was studied by [6j. A whole graph Q includes marginal genealogies 
7a and Tb at the locus A and B, respectively. Denote the edges of a graph by E{-). 
Edges of g are partitioned into A = E{g) n £^(7:4) n E{TbY, ^ = E{G) n E{TaY H E{Tb), 
C = E{g)r\E{TA)r\E{TB) and V = E{g)r\E{TAY^E{TBY- We can A, B and C ancestral 
lineages. T> are not ancestral lineages since they are not ancestral to any allele in a sample. 
Let 8t be the edges of a section of Q taken at time t backwards. Denote the number of 
ancestral lineages by a{t) = lirtn^l, h{t) = \£t^B\, c{t) = \£t'^C\. The marginal transition 
rates of {a{t) ,h{t) , c{t)) do not depends on \£t nP[ and is Markovian [5l[7]. The rates are: 



(3.1) 



(a,6,c) 



(a + 1,6+ l,c- 1), C/o/2, 

(a — 1, 6 — 1, c + 1), ah, 

(a — 1,6, c), ac + a(a — l)/2, 

(a,6-l,c), 6c + 6(6-l)/2, 

(a,6,c-l), c(c -l)/2. 



with a + 6 + c > 1. The backward equation for the joint probability generating function 
6,m,n(i) = Ez,m,n[p''^*^g^^*^S'''^*^] of the Markov chain {a{t),b{t),c{t);t > 0} on the integer 
lattice \0 is 



Lin,n 



dt 



(3.2) 



(l + m + n){l + m + n — 1) -\- np np 

2 M,m,n H ^y+l,m+l,n— 1 

/(/ — 1 + 2n) m{m — 1 + 2n) n{n — 1) ^ 

~l ^ M— l,m,n H ^ M,m— l,n H ^ Ql,m,n—l 



for {l,m,n) G Z']_\0, where terms whose subscripts have negative integer are zero. It is 
straightforward to see that the moments fi^m^n 

(t) = Ep,g,g[x(t)'y(t)™xi(t)"] also satisfy the 
system of the differential equations (|3.2p . Therefore a moment duality follows immediately 
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[5]. We give a proof, since it is useful to introduce a numerical method for computing 
moments. 



Lemma 3.1. The diffusion process {x{t),y{t), z{t);t > 0} in H with {x{0),y{0), xi{0)) = 
{p,q,g) and the Markov chain {a{t),b{t),c{t);t > 0} in Z']_\0 whose transition rates are 
\3. 1]) with (a(0), 6(0), c(0)) = {l,m,n) are dual to each other: 

Ep,,Jx{tfy{trxi{tr] = E,,^,„[p'^Wg^W/W]. 

Proof. The system of the differential equations (j3.2p is equivalent to an integro-recurrence 
equation 

6,m,nW= / ni,m,n{s)e-^''"''"^'~'^ds, l,m,n e Z+, 
Jo 

where 

np 1(1 — 1 + 2n) m{m—l+2n) 

I Q,m,n — ~^Q+l,m+l,n— 1 H ^ si— l,m,7i H 2 s/,m— l,n 



(3.3) 



n(n — 1) ^ 

H Ql,m,n-1 + '"T'4«-l,m-l,n+l, 



and 7i^m,n = {{l + m + n){l + m + n — l) + np)/2. The integro-recurrence equation is recast 
into 

(3.4) 6,m,n(i) = /" ^l''>TT''n'\l,m,n]^i'^m\n'{t - s)-fi,m,ne~'^''"'-''''ds, 

l',rn',n' 

where the transition probability is given by dividing the rates in (j3.ip by ^a,b,c- Meanwhile 
E,,^,„rW</'W5^W] = E,,„,„{e [p"Wg^W5^W|(a(5),6(5),c(s)) = (/',m',n')]} 



,m',7i' 

Ez,m,n['^Z',m',n'(* ~ ^)]) 



pa{t~s) ^b{t~s) gC{t-s) 



□ 



where the second equality follows by the strong Markov property. This expression is 
equivalent to (|3.4|) . Therefore Ci,m,n{i) = '^i,m,n[p^^^\^^^^ 9^^^^]- On the other hand, it is 
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straightforward by Ito's formula to see that vi^m^n 

{t) = Ep,g,g[x(i)'y(t)™x(t)"] satisfy the 

system of differential equations (j3.2p . 

The duality relation is useful for numerical computation of the moments yi^m,n{t) by 
simulating independent copies of {a{t) , b{t) , c{t)) by the Markov chain Monte Carlo with 
the transition probabilities (|3.ip . Consider the simulations stopped at a time t. The 
average over p'^(^) q^(^) g'^(^) of the copies is then an unbiased estimator of the moment 
i^i,m,n{t)- A similar method was used for computing likelihood of a sample in a varying 
environment [8]. The simulation can be stopped before t. Let a hitting time be 



where S = {(0, 0, 1), (1, 1, 0)} is the closed set of states from which a chain cannot exit. If 
T <t, 



T = inf {s > 0; (a(s), b{s), c{s)) e 5} 



{Eb'^Wg^W5'=(*)|(a(r),6(r),c(r) = il',m',n')]] 




P/,m,n[(«(r),&(-r),c(r)) = (0,0,l)]i/o,o,i(*-^) 



+IP«,m,„[(a(r),6(r),c(r)) = (1, 1, 0)]z.i,i,o(t - r) 



where the second equality follows by the strong Markov property, and 




and 



i^i,i,o{t -t) =pq + 



2(9 -pq) 
2 + p 




Prom these observations we have a numerical method for computing moments: 



Proposition 3.2. Set a Markov time a = t At, where x Ay = min{x,y}. An unbiased 
estimator o/i// „j „(t) is the average over following values obtained by independent copies of 
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(a(cj), 6((t), c((t)) simulated by the Markov chain Monte Carlo with the transition probabil- 
ities (E2P.- ifcr = t, the value is p<^) qK^) g^'') , If a = t and {a{a) , b{a) , c{a)) = (0,0, 1), 
the value is fo,o,i(i — ''")• -(f f = t and (a(cr), b{a), c(cr)) = (1, 1, 0), the value is i^i,i,o(i — ''")• 

4. Closed-form expressions of moments 

Since the system of differential equations for the moments of the distribution gener- 
ated by the two-locus two-allele Wright-Fisher diffusion model (|3.2|) is that for the joint 
probability generating function of the distribution of the numbers of ancestral lineages in 
a section of a two-locus ARG, relationship among moments can be specified in terms of 
events on the ARG. In (j3.3p . the first event is a recombination, the second and the third 
events are marginal coalescence in Ta and Tb, respectively, and the forth event is a joint 
coalescence in both of Ta and Tb- We call the fifth event as a null coalescence, because 
coalescence events occur in neither Ta nor Tb- The following lemma is obvious. 

Lemma 4.1. The manifold of moments spanned by the set of moments whose ranks and 
classes are equals to or smaller than specified values is closed under the operation of T ■ 
Neither of class and rank of a moment change under recombination and null coalescence 
operations. 

It was shown that the moments /^i^m,n 

{t) can be obtained recursively from the smaller 
rank moments [IBj. We present a method to compute closed- form expressions of the 
moments i'i^m,n{t) by using the ARG terminology, since it gives systematic insights in the 
computation. In our approach class of a moment is essential. Since closed-form expressions 
of all moments whose classes are less than two are available (see Section 2), let us start 
with moments whose classes and ranks are two and i (> 4), respectively: z^i-4,o,25 t'j-3,i,i) 
i^j-2,2,0i ^^0,4-4,21 i^i,j-3,i) and 1^2,1-2,0- The former three and the latter three moments are 
closed respectively by recombinations and null coalescences. The expressions for the latter 
three moments are obtained immediately by exchanging p and q in the expressions of the 
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former three moments. The system of three differential equations for the former three 
moments whose ranks are j (4 < j < i) are 



d 
di 



( u \ 



I (j-2)(j-3)+2p 
2 

a - 3) 




(i-l)(i-2)+p 
2 







2(i 



^ ( ^'j-4,0,2 ^ 
V ^i-2,2,0 / 



(4.1) 



+ 



V 





2 




(i- 


-2)(i- 


-3) 




2 






-2)(i- 


-3) 



'^i-5,0,2 
^^^■-4,1,1 
^^^■-3,2,0 / 



+ 



^i-3,0,1 
V ^^i-2,1,0 / 



where = by a convention. The solution involves eigenvalues of the matrix in (j4.ip . 
which are roots of the cubic equation 



3 , 3j2-9j + 8 + 3p^2 , f3(j-l)'(j-2)2 



+ (3j2 - iij + I5)p + A 



2 1^ 2 

J(J - - 2)'(j - 3) , (3j4 - 22j3 + 65j2 - 86^ + 48);9 + - 5j + 8)p^ 



+ 



4 



0. 



If j = 4 the system of differential equations ()4.ip involves fo,o,2) ^^2,2,0 and class one 

moments, we can obtain the closed-form expressions. In fact, the closed-form expressions 
of the moments whose classes and ranks are two and four, respectively, were obtained by 
|20j . Solving (14. ip iteratively in j = 5, 6, i, we obtain the closed form expressions of the 
moments whose classes and ranks are two and i, respectively. 

Computation of the moments whose classes and ranks are A; (> 3) and i{> 2k), re- 
spectively, is in the same manner. To obtain closed-form expressions of these moments a 
system of /c + 1 differential equations must be solved. The solution involves eigenvalues of 
a tri-diagonal matrix Aj. with 

{Ak)s,s-i = {s-l){j -2k + s-l), 

{j-k + s-l){j-k + s-2) + {k-s + l)p 

\ kjs.s 2 ' 

k - s + 1 
\Ak)s,s+l — 7, 
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for 1 < s < A: + 1 and 2k < j < i. The eigenvalues are roots of the {k + l)-th degree 
algebraic equation. If A; > 4 we cannot expect explicit closed-form expressions of the 
eigenvalues. The computation eventually involves moments whose classes and ranks are t 
and u, respectively, where 1 < t < k and i — k + 1 < u <2t. 

5. Properties of ARG 

We have been discussed how to compute moments of the distribution generated by the 
two-locus two-allele Wright-Fisher diffusion model. The moments are useful for studying 
the two-locus ARG, since the moments of the diffusion, i'i^m,n{t), is the joint probability 
generating function of the distribution of the numbers of ancestral lineages in a section of 
an ARG of a sample (a(0), 6(0), c(0)) = (/, m, n). We define rank and class of the numbers 
of ancestral lineages in a section of an ARG, (Z, m, n), hy I + m + 2n and n + min{/, m}, 
respectively. An ARG of a class-zero sample is a marginal genealogy, whose properties are 
well known. In the following we consider a sample whose class is larger than zero. Since 
Lemma 13.11 gives 

limWO = -^ + ^, n + min{/,W>l, 
t^oo 2 + p 2 + p 

the stationary distribution of the numbers of ancestral lineages in a section of an ARG is 

^5(0,0,1) + ^<^(1,1,0). 

The distribution of the number of ancestral lineages in a section of an ARG of a sample 
(0, 0, 2) can be obtained from the known closed-form expression of the moments of the 
distribution generated by the two- locus two-allele Wright-Fisher diffusion model [20]. It 
seems that general formula (applicable to all rank moments) of the distribution of a sample 
whose class is larger than one are not available. In contrast, we have general formula of 
the distribution of class-one samples. The closed-form expression can be obtained by using 
closed-from expressions of the moments with a finite series expansion of the Gegenbauer 
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polynomial [3]: 
(5.1) 



1 ^ {-m)i{m + l)i+2 



1=0 \ ' f 



Let i^fc,i,o(t) = T,i,m,nfi,rn,n{^)P^9"^9'^' wheve fl^m,n{t) = k,ifi[{a{t) ,h{t) , c{t)) = {l,m,n)]. 
The distribution of a sample (1,1,0) is 

/i,i,o(t) = + ^e-^S /o.o,i(t) = ^(1 - e-^*). 



2+p 2+p 



2 + p' 



smce 



2{g — pq) . -'i±p.tx 
^1,1.0 =Pg+ 2 + 7^ ^ ^' 

For samples (/c, 1,0), k > 2, from Proposition 12.81 and (15. ip the closed-form expressions 

have general formula. For i > we have 

2 



fi,o,i{t) 



2 + p 



Si,o + gi,k{t) 



k-l 



(-l)-[A:]„+i 



_ {k)m+iil{i + ly. 
and for i > 2 we have 



{-m)i{m + l)i+ 2 , (2 - m)i(m - l)i+2 
2(m + 1) +p 



+ 



2m — /9 



m{m+ 1) , 



fc-i 



Ai,o(i) = -gi-i,k{i) - ^ 



-l)'"[/i:] 



m+l 



^ (fc)^+i(i-l)!i! 



{(2m + 1)(1 -m)i_2(m)i 



+ 



(-m)j_i(m+ l)j+i (2 - m)i_i(m - l)j+i 



2(m+ 1) + p 



+ 



2m — p 



m(m4-l) 1 



and 



where 



/i,i,o(i) 



2 + p 



5o,fc(i) 



A;-l 



-1)™[A:] 



m+l 



(^) 



m+l 



2m + 1 



(m+l)(m + 2) (m — l)m 
2(m + 1) + p 2m — p 



m(m+l) 



9i,k{t) 



k-l 

E 



(-ir[^]r 



1 (A;)r 



1 



+ 



1 



(k — m){k — m — 1) 



2m + p 2(m + 1) — p (A; + m)(/c + m + 1) 



(1 — m)j(m)j_|_2 m(m + l)+p ^ 

z!(i + l)! ' ■ 
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We can obtain closed- form expressions of the distribution of samples a (A;, 0, 1), /c > 1 in 
the similar manner. 

Let the waiting times until common ancestors of 7a and Tb, respectively, be 

Wa = inf{s > 0; a{s) + c{s) = 1}, Wb = inf{s > 0; h{s) + c{s) = 1}. 

Proposition 5.1. The waiting time until a sample has common ancestor at hath of the 
two loci is given by 

^l,m,n [Wa VWB<t] = Pz,™,„[(a(t),5(t),c(t)) G S], 

where x y y = max{x,y}. The waiting time until a sample has common ancestor at one 
of the two loci is given by 

P^,n^,„ [Wa AWB>t]= Yl P«,m,n[(a(t), b{t), c(t)) = (/' ,m',n')]. 

n'+mm{l' ,m'}>2 

Remark 5.2. A recursion of the expectation ofWA'^WB for the ARG of a sample (0, 0, c) 
is given by Theorem 4 of [7\ . Theorem 5 of [7\ gives a closed-form expression of the joint 
Laplace transform of Wa V Wb and Wa A Wb for the ARG of a sample (0, 0, 2). 

The idea of the number of recombination events in a sample was introduced by [9]. 
The number of recombination events on the two-locus ARG including non-ancestral lin- 
eages was considered by [51 E] , and a closed- form expression of the probability generating 
function of the number of recombination events was given. Here, we consider the num- 
ber of recombination events on ancestral lineages of an ARG. Let s(t) be the number of 
recombination events occurring to C lineages of an ARG in a time interval (0,t). The 
recombination events are subset of the recombination events occurred on whole lineages 
of the ARG. 



Lemma 5.3. The joint probability generating function of {a{t),b(t),c{t),s{t)) is 
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where {ay{t),b^{t),Cv{t);t > 0} is a modified process of {a{t),b{t),c{t);t > 0} in which the 
recombination fraction is rv, where < v < 1. 

Proof Let 0,m,n(t) = Ei,„,„,o[p"^*^9^^*^c/''^*^^^*^*^]- For (/,m,n) G \0, we have 
d(i,m,n _ {I + m + n){l + m + n - 1) + nvp nvp 

^ ^l,m,n ' ^ ^Z+l,m+l,n— 1 

l{l — \ + 2n) m{m — 1 + 2n) n{n — 1) . 

H ^ Q— l,m,n H ^ l,n H ^ Q,m,n—1 

(5.2) +/mC/-l,r)i-l,n+l ^ 0,m,n 

with the mitial condition ^/,m,n(0) = p^q^g"^. This is uniquely solved by means of the 
Feynman-Kac formula and the result is Lemma 15.31 □ 

Theorem 5.4. The conditional probability generating function of the number of A lineages 
in a section of an ARG of a sample (n, 0, 1) given that no recombination events occur in 
a time interval (0, t) is 

En,o,i,ob"^*^|s(i) = 0] = z>„,o,iW, 
where i^n,o,i{t) is Un,o,iii) with setting q = g = 1 and p = 0. 

Proof. By (j3.ip we see that bo{t) = and co{t) = 1 for all t, and the marginal process 
{ao(t);t > 0} is a death process with death rate i{i + l)/2 when ao(t) = i- The joint 
probability generating function of (a(t), b{t),c{t)) given that no recombination events occur 
in a time interval (0, i) is 



limE„,o,i,or(*)9'^*^/^*^^^'^*^] = E„,o,i,or^*^9'^*^/^*\KO = 0] 

v—>-0 

t 



E. 



n,0,l,0 



pMt)qbo{t)gCoit) |-| ^ Coiu)du^ 



,"o(t)lp-|i 



= gEnfiXoiP^'ne 

where the first equality follows by Lebesgue's convergence theorem and the second equality 
follows by Lemma [5.31 Setting p = q = g = 1, we have Pn,o,i,o[s(0 = 0] = e~^*/^, 
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while setting q = g = I we have E„,o,l,ob"^*^ = 0] = z>n,o,iW^ where z>n,o,i(0 = 
E„,o,i,ob"°^*^]- □ 

Remark 5.5. Explicit closed-form expression of v'n,o,i(t) is available in Section 2, since 
^n,o,i{t) = /Un+i,i,o(i) + ^n,o,i(i)- This expression follows immediately by considering the 

ARG. Recombination might occur on the single C lineage. By Poisson nature of recom- 

pj. 

bination events, the probability that no recombination occur on the single lineage is e . 
The marginal process {ao(t);t > 0} follows the death process independently. 

Lemma 5.6. Let S be the absorbing states. The probability generating function of the 
number of recombination events on ancestral lineages of an ARG until a sample has com- 
mon ancestor at both of the two loci is 



exp 



p{l-v) 



j-Tv 

/ Cy{u)d% 
Jo 



where Ty = inf{s > 0; (a^(s), 6„(s), c„(s)) = S}. 



Proof Let 0,m,n = ^l,m,n,o\p''^^h''^^'^9''^^^v'^% For {l,m,n) G Z^\0, we have 
{I -\- m -\- n){l -\- m -\- n — 1) -\- nvp nvp 

" ~ ^ Ql,m,n ~l ^ M+l,m+l,n— 1 

Z(l — 1 + 2n) m(m — 1 + 2n) n(n — 1) 

"I 7^ Q—l,m,n H n Q,m—l,n H n Q,m,n—1 



(5.3) 



^Imr n{l-v)p 



with the boundary condition ^o,o,i = 9 ^'^d Ci,i,o = PQ- This boundary value problem is 
uniquely solved by means of the Feynman-Kac formula. That is 



Cl,m,n 

= 9^l,m,n,0 
+Pq^l,m,n,0 



exp< - 



p{l-v) 



Cy{u)dv}^ , {ay{Ty),by{Ty),Cy{Ty)) = (0, 0, 1) 

p(l - v) f^" ] 

exp <; J Cy{u)dU I , {ay{Ty),by{Ty),Cy{Ty)) = (l, 1,0) 
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On the other hand we have 

^i^m^uM'^^U'^^^g^'^^^v'^^^ = 5IE^,n^,n,oK^"\ (a(T),6(r),c(r)) = (0,0,1)] 

+pgEz,^,„,ob'^"\(a(T),6(r),c(r)) = (1,1,0)]. 

Thus the probabihty generating functions of the number of recombination events on an- 
cestral hneages of an ARG until a sample has common ancestor at both of the two loci 
with the given state in which the sample path absorbed are 



^i,m,nM"\{a{T)Mr),c{T)) = (0,0,1)] 
p{l-v) 



E 



l,m,n,0 



exp 



Cv{u)du> ,{a^{Ty),by{Ty),Cy{T^)) = (0,0,1) 



and 



Ei,rn,n,ob^^"\ (a(r),6(r),c(r)) = (1,1, 
p{l-v) p 



E 



l,m,n,0 



exp 



Cy{u)du ) , (a^(rj,),6^(rj,),Ct,(rt,)) = (1, 1,0) 



Lemma 15.61 follows as the summation of these two probability generating functions. □ 

Corollary 5.7. The expected number of recombination events on an ancestral lineages of 
an ARG until a sample has a common ancestor at both of the two loci is 



Ei,m,n,o[s(r)] = ^:E 



l,Tn,n,0 



c{u)du 



P 
2 



E 



{V ,m' ,n')&\\{0,S} 



k'^Lm,nfl[{a{u),h{'^),c{u)) = {l' , m' , n')]du. 



Proof. Let Tii^rn',n' be the sojourn time of a sample path of the process of the numbers of 
ancestral lineages in a section of an ARG of a sample (/, m, n) stays at a state (/', m', n') ^ 
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S. Then 



,m,n,0 



c{u)du 



,m,n,0 \_^V,m',n' 

{V,m',n')&\\{0,S} 



{V ,mi ,n')&l\\{0,S} 



hl',m',n'){a{t),b{t),c{t)) 



^ n' Fi^rn,nfi[iO'iu),Ku),c{u)) = {l','m',n')]du, 



{l',m',n')€Z\\{0,S} 

where the last equahty follows by Fubini's theorem. 



□ 



Remark 5.8. A recursion 0/ IE;, m,n.o['S (''")] is given by Theorem 6of\7]. 



Corollary 5.9. The probability that no recombination events occur on an ARG until a 
sample has common ancestor at both of the two loci is 



^l,m,n,o[s{T) = 0] = IE/,m,n,0 



exp <5 -| J co{u)du 



Theorem 5.10. The probability that no recombination events occur on an ARG of a 
sample (0, 0, n) until the sample has common ancestor at both of the two loci is 

{n-l)\ 



(p+i; 



Proof. By (j3.ip we see that ao(t) = boit) = for all t and the marginal process {co(t); t > 
0} is a death process with death rate i{i — l)/2 when co(t) = i. Let a hitting time be 
7 = inf{s > 0; a{s) = n — 1}. By Corollary 15.91 



IEo,0,n,0 
Eo,0,n,0 <! E 



exp --^ j co{u)du 



exp 



P 



TO 



co{u)du 



7 



En 



0,0,n,0 S Eo,0,n-l,0 



P 



TO 



exp <; -- I co{u)du 



e 2 



E, 



0,0,n-l,0 



exp <; --^ J co{u)du 



n — 1 

n — 1 + yO' 
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where the third equahty follows by the strong Markov property, with the boundary con- 
dition 



0,0,1,0 



TO 



exp <! / co{u)du 







1. 



2 

The recursion solved immediately and we get Theorem 15.101 □ 



Theorem 5.11. The probability that no recombination events occur on an ARG of a 
sample (n, 0, 1) until the sample has common ancestor at both of the two loci is 

A ^(^ + 1) 

+ + 

Proof. By (j3.ip we see that bo{t) = and co{t) = 1 for all t, and the marginal process 
{ao(t); t > 0} is a death process with death rate i{i+l)/2 when ao(t) = i- By Corollarv l5.9l 
we have 

Pn,0,i[5(r) =0] =E„[e-^°]. 
The expression follows by a similar argument as used in Proof of Theorem I5.10[ □ 

Finally, let us consider the limit p — )• oo. For the purpose we introduce two processes. 
The one is a diffusion process {xoo{t),yoo{'t)',t > 0} in [0, 1]^ with a generator 

_ x(l-x) y(l - y) 



2 9x2 2 

and (xqo (0),yoo(0)) = {p,q). The other is a Markov chain {aoo(t), &oo(t); * > 0} in Zl\0 
whose transition rates are: 

(a -1,6), a(a -l)/2, 
(a,&-l), &(6-l)/2. 
and (aoo(0),&oo(0)) = Let t^ = {s> 0; (aoo(s), 6oo(s)) G {(1, 0), (0, 1), (1, 1)}}. 



(a, 6) 



Theorem 5.12 (Theorem 1 and 2 of [3]). // (xoo(O), yoo(O)) = i/ien {x(t),y(t);t > 

0} converges weaA;/y in C([0, cxd), [0, l]^) to {xoo(i), yoo(i); ^ > 0}, and {2;(t) — (ie~ 2*; t > 0} 
converges weakly in C([0, oo),M) to i/ze ^ero process in M. as p ^ 00. T/ie two function 
spaces are given the topology of uniform convergence on compact intervals. 
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Corollary 5.13. The probability generating function of the distribution of the numbers of 
ancestral lineages in a section of an ARG of a sample (a(0), 6(0), c(0)) = {l,m,n) has a 
limit 



oo. 



Proof. From Lemma l3. II we have 



E,.„,„b''"'9'""/"'l = E„,,[i(()'+'>(()"'+»] 



-Ep,,,,[x(t)'+"-y(t)-+"-^z(t)^i 



^ (n — i)lil 

It follows from Theorem 1 5 . 1 2 1 and Lebesgue's convergence theorem that 



iEp,,,,[^(t)'+"-*y(t)™+""*^(t)1 < K,,,Mty] ^0 p ^ oo, 

for t > 0, i = 1, 2, n, while 

Ep,,,,[x(t)'+"y(t)"^+"] ^ Ep[xoo(t)^+"]E,[yoc(t)'"+"] = E,+„[p'^-W]E„+„[g^-(*)]. 

as p — )• oo. The last equality is a result of the duality between {xoo(t);t > 0} and 
{aoo{t)]t > 0}, and between {yoo{t)]t > 0} and {6oo(i);^ ^ 0}- 



Corollary 15.131 shows that all AB gametes in a sample instantaneously split into a pair 
A— and —B gametes in the limit p — )• oo. Therefore the length of C lineages in an ARG 
goes to zero in this limit. 

Theorem 5.14. The expected lengths of C lineages and whole lineages of an ARG of a 
sample {l,m,n) until the sample has common ancestor at both of the two loci are 



(5.4) 

and 

(5.5) ^l,m,n 

respectively. 



c{u)du 



-E 



aoo{u)boo{u)du 



+ 



2n 



{a{u) + 6(n) + c{u))du 



E, 



'l,m 



(aoo(u) + boo{u))du 
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Proof. Let r]i m,n = lim IEz,m,n,o['S(r) = 0] and Xi m = 'ni,m,o- For {l,m,n) G Z^\0 we have 

p— >oo 

m,m,n = n + Xi+n,m+n forn > 1 and 

/(;-l)+m(m-l) ^ , /(/-I) , m(m-l) 
U — Lm Aim H '^/-l.m H 7: Aim-l, 



with the boundary condition Ai.o = Ao,i = Ai^i = 0. This boundary value problem is 
uniquely solved by means of the Feynman-Kac formula. That is 



From Corollary [521 have ()5.4p . Let 



{u)hoo{u)du 



VLm,n 



hm E, 

p— J-OO 



7, m, 71,0 



{a{u) + b{u) + c{u))du 



and A;_^ = r/^ ^^ Q. For {l,m,n) G Zi]_\0 we have rjlm,!! = ^z,m for n > 1 and 



= l + m 



l{l - 1) + m{m - 1) ^, 1(1-1)., 



m(m — 1) , 

A 



i,m— 1' 



with the boundary condition A']^ q = Aq i = A'^ i = 0. This boundary problem is also solved 
and we have ()5.5p . 

□ 
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